Extraction of Informative Expressions from Domain-specific Documents

نویسندگان

  • Eiko Yamamoto
  • Hitoshi Isahara
  • Akira Terada
  • Yasunori Abe
چکیده

What kinds of lexical resources are helpful for extracting useful information from domain-specific documents? Although domain-specific documents contain much useful knowledge, it is not obvious how to extract such knowledge efficiently from the documents. We need to develop techniques for extracting hidden information from such domain-specific documents. These techniques do not necessarily use state-of-the-art technologies and achieve deep and accurate language understanding, but are based on huge amounts of linguistic resources, such as domain-specific lexical databases. In this paper, we introduce two techniques for extracting informative expressions from documents: the extraction of related words that are not only taxonomically related but also thematically related, and the acquisition of salient terms and phrases. With these techniques we then attempt to automatically and statistically extract domain-specific informative expressions in aviation documents as an example and evaluate the results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Extraction of Time Expressions Accross Domains in French Narratives

The prevalence of temporal references across all types of natural language utterances makes temporal analysis a key issue in Natural Language Processing. This work adresses three research questions: 1/is temporal expression recognition specific to a particular domain? 2/if so, can we characterize domain specificity? and 3/how can subdomain specificity be integrated in a single tool for unified ...

متن کامل

Syntactic Folding and its Application to the Information Extraction from Web Pages

The paper deals with investigations concerning potential structures of documents that will be subject to automated information extraction. The focus is on folding principles and their influence on the recognition of certain data in a document undergoing the extraction. Introduction The topic of our work is information extraction from the Internet. There are a couple of approaches which deal wit...

متن کامل

Cultural Frame and Translation of Pronominal Adverbs in Legal English

This paper explores the relationship between cultural knowledge and the specific meaning of a pronominal adverb in legal English where Chinese translators need to get the correct translation in their venture into translating the language of law. On the one hand, relying on the relevant legal cultural knowledge functioning as domain-general reference within a community or jurisdiction, tra...

متن کامل

Leveraging Giant Text Corpora to Enhance the Coverage of Pattern-based Information Extraction Systems

Pattern-based approaches for Information Extraction typically apply a pattern learner to a set of domain-specific documents to generate extraction patterns that comprise the IE system. This limits the coverage of the system to the expressions and language constructs used within the training data. This research exploits the vast quantities of text readily available in large corpora, such as The ...

متن کامل

Automatic Acquisition of Semantics-Extraction Patterns

This paper examines the use of parallel and comparable corpora for automatic acquisition of semantics-extraction patterns. It presents a new method of the pattern extraction which takes advantage of parallel texts to “port” text mining solutions from a source language to a target language. It is shown that the technique can help in situations when the extraction procedure is to be applied in a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008